tts service
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Chen, Haozhe, Chen, Run, Hirschberg, Julia
While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible by recent advances in foundation voice cloning models. Based on the few-shot capability of our emotion control framework, we propose two methods to apply emotion control on emotions described by open-ended text, enabling an intuitive interface for controlling a diverse array of nuanced emotions. To facilitate a more systematic emotional speech synthesis field, we introduce a set of evaluation metrics designed to rigorously assess the faithfulness and recognizability of emotion control frameworks. Through objective and subjective evaluations, we show that our emotion control framework effectively embeds emotions into speech and surpasses emotion expressiveness of commercial TTS services.
- North America > United States (0.46)
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Rownicka, Joanna, Sprenkamp, Kilian, Tripiana, Antonio, Gromoglasov, Volodymyr, Kunz, Timo P
We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.
A Voice Controlled E-Commerce Web Application
Kandhari, Mandeep Singh, Zulkernine, Farhana, Isah, Haruna
Abstract-- Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-tohuman andhuman-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics. I. INTRODUCTION Voice recognition is used interchangeably with speech recognition, however, voice recognition is primarily the task of determining the identity of a speaker rather than the content of the speaker's speech [1].
- North America > Canada > Ontario > Kingston (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Hawaii (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Research Report (1.00)
- Overview (0.68)
- Information Technology > Services > e-Commerce Services (1.00)
- Education > Educational Setting (0.93)